AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Vision-Language Models

# Vision-Language Models

Test Push
Apache-2.0
distilvit is an image-to-text model based on a VIT image encoder and a distilled GPT-2 text decoder, capable of generating textual descriptions of images.
Image-to-Text Transformers
T
tarekziade
17
0
MMICL Instructblip T5 Xxl
MIT
MMICL is a multimodal vision-language model combining blip2/instructblip, capable of analyzing and understanding multiple images while following instructions.
Image-to-Text Transformers English
M
BleachNick
156
11
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase